screening rate
Safe Screening Rules for Group OWL Models
Bao, Runxue, Lu, Quanchao, Zhang, Yanfu
Group Ordered Weighted $L_{1}$-Norm (Group OWL) regularized models have emerged as a useful procedure for high-dimensional sparse multi-task learning with correlated features. Proximal gradient methods are used as standard approaches to solving Group OWL models. However, Group OWL models usually suffer huge computational costs and memory usage when the feature size is large in the high-dimensional scenario. To address this challenge, in this paper, we are the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks. Thus, by removing the inactive features during the training process, we may achieve substantial computational gain and memory savings. More importantly, the proposed screening rule can be directly integrated with the existing solvers both in the batch and stochastic settings. Theoretically, we prove our screening rule is safe and also can be safely applied to the existing iterative optimization algorithms. Our experimental results demonstrate that our screening rule can effectively identify the inactive features and leads to a significant computational speedup without any loss of accuracy.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Arizona (0.04)
- Asia > Middle East > Jordan (0.04)
Analyzing Geospatial and Socioeconomic Disparities in Breast Cancer Screening Among Populations in the United States: Machine Learning Approach
Hashtarkhani, Soheil, Zhou, Yiwang, Kumsa, Fekede Asefa, White-Means, Shelley, Schwartz, David L, Shaban-Nejad, Arash
Breast cancer screening plays a pivotal role in early detection and subsequent effective management of the disease, impacting patient outcomes and survival rates. This study aims to assess breast cancer screening rates nationwide in the United States and investigate the impact of social determinants of health on these screening rates. Data on mammography screening at the census tract level for 2018 and 2020 were collected from the Behavioral Risk Factor Surveillance System. We developed a large dataset of social determinants of health, comprising 13 variables for 72337 census tracts. Spatial analysis employing Getis-Ord Gi statistics was used to identify clusters of high and low breast cancer screening rates. To evaluate the influence of these social determinants, we implemented a random forest model, with the aim of comparing its performance to linear regression and support vector machine models. The models were evaluated using R2 and root mean squared error metrics. Shapley Additive Explanations values were subsequently used to assess the significance of variables and direction of their influence. Geospatial analysis revealed elevated screening rates in the eastern and northern United States, while central and midwestern regions exhibited lower rates. The random forest model demonstrated superior performance, with an R2=64.53 and root mean squared error of 2.06 compared to linear regression and support vector machine models. Shapley Additive Explanations values indicated that the percentage of the Black population, the number of mammography facilities within a 10-mile radius, and the percentage of the population with at least a bachelor's degree were the most influential variables, all positively associated with mammography screening rates.
- North America > United States > Tennessee > Shelby County > Memphis (0.05)
- North America > United States > Texas (0.04)
- North America > United States > New York (0.04)
- (9 more...)
Safe Triplet Screening for Distance Metric Learning
Yoshida, Tomoki, Takeuchi, Ichiro, Karasuyama, Masayuki
We study safe screening for metric learning. Distance metric learning can optimize a metric over a set of triplets, each one of which is defined by a pair of same class instances and an instance in a different class. However, the number of possible triplets is quite huge even for a small dataset. Our safe triplet screening identifies triplets which can be safely removed from the optimization problem without losing the optimality. Compared with existing safe screening studies, triplet screening is particularly significant because of (1) the huge number of possible triplets, and (2) the semi-definite constraint in the optimization. We derive several variants of screening rules, and analyze their relationships. Numerical experiments on benchmark datasets demonstrate the effectiveness of safe triplet screening.
- Asia > Japan (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling
Shibagaki, Atsushi, Karasuyama, Masayuki, Hatano, Kohei, Takeuchi, Ichiro
The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of non-active features/samples. So far, safe screening has been individually studied either for feature screening or for sample screening. In this paper, we introduce a new approach for safely screening features and samples simultaneously by alternatively iterating feature and sample screening steps. A significant advantage of considering them simultaneously rather than individually is that they have a synergy effect in the sense that the results of the previous safe feature screening can be exploited for improving the next safe sample screening performances, and vice-versa. We first theoretically investigate the synergy effect, and then illustrate the practical advantage through intensive numerical experiments for problems with large numbers of features and samples.